11/09/2017
Examples: random forests, neural networks, etc.
Benefit: offer better predictive ability than more interpretable models such as linear regression models, regression and classification trees, etc.
Disadvantages:
LIME (Local Interpretable Model-agnostic Explanations)
lime to implement the method in RFigure 1 in Ribeiro et al.
Figure 3 in Ribeiro et al.
Figure 2 in Ribeiro et al.
lime R package supports:
caret and mlr packagesFigure 4 in Ribeiro et al.
# Iris dataset iris[1:3, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa
# Split up the data set into training and testing datasets iris_test <- iris[1:5, 1:4] iris_train <- iris[-(1:5), 1:4] # Create a vector with the responses for the training dataset iris_lab <- iris[[5]][-(1:5)]
# Create random forest model on iris data library(caret) rf_model <- train(iris_train, iris_lab, method = 'rf') # Can use the complex model to make predictions Pred <- predict(rf_model, iris_test) Actual <- iris[1:5, 5] data.frame(iris_test, Pred, Actual)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Pred Actual ## 1 5.1 3.5 1.4 0.2 setosa setosa ## 2 4.9 3.0 1.4 0.2 setosa setosa ## 3 4.7 3.2 1.3 0.2 setosa setosa ## 4 4.6 3.1 1.5 0.2 setosa setosa ## 5 5.0 3.6 1.4 0.2 setosa setosa
# Create an explainer object library(lime); explainer <- lime(iris_train, rf_model) # Sepal length quantiles obtained from training data explainer$bin_cuts$Sepal.Length
## 0% 25% 50% 75% 100% ## 4.3 5.2 5.8 6.4 7.9
# Probability distribution for sepal length explainer$feature_distribution$Sepal.Length
## ## 1 2 3 4 ## 0.2758621 0.2413793 0.2413793 0.2413793
Histograms of predictor variables from training data
## Sepal.Length Sepal.Width Petal.Length Petal.Width Case.Number setosa ## 1 5.314807 4.04260 5.856493 1.5136405 1 0.000 ## 2 4.546596 2.73312 1.028409 0.4986859 1 1.000 ## 3 5.467831 2.98690 6.565054 0.2349578 1 0.468 ## 4 6.935536 2.38896 5.548015 0.5669754 1 0.468 ## versicolor virginica ## 1 0.234 0.766 ## 2 0.000 0.000 ## 3 0.116 0.416 ## 4 0.082 0.450
We need to determine how similar a sampled case is to the observed case in the testing data
Case 1 from testing data:
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.1 3.5 1.4 0.2
First sample from training data variable distributions associated with case 1 of testing data:
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 5.314807 4.0426 5.856493 1.513641
LIME uses exponential kernel function \[\pi_{x_{obs}}(x_{sampled}) = exp\left\{\frac{−D(x_{obs}, \ x_{sampled})^2}{σ^2}\right\}\] where
\(x_{obs}\): observed data vector to predict
\(x_{sampled}\): sampled data vector from distribution of training variables
\(D(\cdot \ , \ \cdot)\): distance function such as euclidean distance, cosine distance, etc.
\(\sigma\): width (default set to 0.75 in lime)
\[\mbox{P(setosa)} \sim \mbox{Sepal.Length} + \mbox{Sepal.Width} + \mbox{Petal.Length} + \mbox{Petal.Width}\] \[\mbox{P(versicolor)} \sim \mbox{Sepal.Length} + \mbox{Sepal.Width} + \mbox{Petal.Length} + \mbox{Petal.Width}\] \[\mbox{P(virginica)} \sim \mbox{Sepal.Length} + \mbox{Sepal.Width} + \mbox{Petal.Length} + \mbox{Petal.Width}\]
lime supports:
lime is programmed to use ridge regression as the "simple" model\[\mbox{P(Setosa)} \sim \mbox{Petal.Length} + \mbox{Sepal.Length} \]
explain function in lime# Explain new observation
explanation <- explain(iris_test, explainer, n_labels = 1,
n_features = 2, n_permutations = 5000,
feature_select = 'auto')
explanation[1:2, 1:6]
## model_type case label label_prob model_r2 model_intercept ## 1 classification 1 setosa 1 0.3598835 0.2457926 ## 2 classification 1 setosa 1 0.3598835 0.2457926
explanation[1:2, 7:10]
## model_prediction feature feature_value feature_weight ## 1 0.6825058 Sepal.Width 3.5 0.0292960 ## 2 0.6825058 Petal.Width 0.2 0.4074172
explanation[1:2, 11:13]
## feature_desc data prediction ## 1 3.3 < Sepal.Width 5.1, 3.5, 1.4, 0.2 1, 0, 0 ## 2 Petal.Width <= 0.4 5.1, 3.5, 1.4, 0.2 1, 0, 0
plot_features(explanation)
Based on these explanations, how is the neural network distinguishing between wolves and huskies?
| Response | Without Explanations | With Explanations |
|---|---|---|
| Trusted the bad model | 10 out of 27 | 3 out of 27 |
| Mentioned snow as a potential feature | 12 out of 27 | 25 out of 27 |
Original paper: Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: Explaining the predictions of any classifier. In Knowledge Discovery and Data Mining (KDD), 2016. https://arxiv.org/abs/1602.04938
Informative Video: https://www.youtube.com/watch?v=hUnRCxnydCc
Python Code on Marco's GitHub: https://github.com/marcotcr/lime
lime R Package on Thomas Pedersen's GitHub: https://github.com/thomasp85/lime
lime Vignette: https://github.com/thomasp85/lime/blob/master/vignettes/Understanding_lime.Rmd